Extracting WordNet-like Top Concepts from Explanatory Dictionaries*

نویسندگان

  • Hiram Calvo
  • Alexander Gelbukh
  • Juan de Dios
چکیده

Correct interpretation of the text frequently requires knowledge of semantic categories of nouns, especially in languages with free word order. For example, in Spanish the phrases pintó un cuadro un pintor (lit. painted a picture a painter) and pintó un pintor un cuadro (lit. painted a painter a picture) mean the same: ‘a painter painted a picture’; with the only way to tell the subject from the object being by knowing that pintor ‘painter’ is causal agent cuadro is a thing. We present a method for extracting semantic information of this kind from existing machine-readable human-oriented explanatory dictionaries. First, we extract from the dictionary an is-a hierarchy and manually mark the categories of a few top-level concepts. Then, for a given word, we follow the hierarchy upward until finding a concept whose semantic category is known. Application of this procedure to two different human-oriented Spanish dictionaries gives additional information as compared with using solely Spanish EuroWordNet. In addition, we show the results of an experiment conducted to evaluate the similarity of word classification with this method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Lexico-conceptual Knowledge for Developing Persian WordNet

Semantic lexicons and lexical ontologies are some major resources in natural language processing. Developing such resources are time consuming tasks for which some automatic methods are proposed. This paper describes some methods used in semi-automatic development of FarsNet; a lexical ontology for the Persian language. FarsNet includes the Persian WordNet with more than 10000 synsets of nouns,...

متن کامل

Word Association Thesaurus As a Resource for Building WordNet

The goal of the present paper is to report on the on-going research for applying psycholinguistic resources to building a WordNet-like lexicon of the Russian language. We are to survey different kinds of the linguistic data that can be extracted from a Word Association Thesaurus, a resource representing the results of a largescaled free association test. In addition, we will give a comparison o...

متن کامل

Processing and extracting data from an open dictionary of the Portuguese language

Synonyms dictionaries are useful resources for natural language processing. Unfortunately their availability in digital format is limited, as publishing companies do not release their dictionaries in open digital formats. Dicionário-Aberto (Simões and Farinha, 2010) is an open and free digital synonyms dictionary for the Portuguese language. It is under public domain and in textual digital form...

متن کامل

Adjectives in RussNet

This paper deals with the problem of structuring adjectives in a wordnet. We will present several methods of dealing with this problem based on the usage of different language resources: frequency lists, text corpora, word association norms, and explanatory dictionaries. The work has been developed within the framework of the RussNet project aiming at building a wordnet for Russian. Three types...

متن کامل

Development of the Hungarian WordNet Ontology and its Application to Information Extraction

This paper presents an outline of the construction process of the Hungarian WordNet Ontology, and the description of an information extraction application utilizing the ontology. and MorphoLogic) in a 3-year project funded by the European Union ECOP program (GVOP-AKF-2004-3.1.1.) The Princeton WordNet (WN) linguistic ontology ([1]) has become a standard and an invaluable semantic resource withi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008